An empirical Bayes mixture model for SNP detection in pooled sequencing data
نویسنده
چکیده
MOTIVATION Detecting single-nucleotide polymorphism (SNP) in pooled sequencing data is more challenging than in individual sequencing because of sampling variations across pools. To effectively differentiate SNP signal from sequencing error, appropriate estimation of the sequencing error is necessary. In this article, we propose an empirical Bayes mixture (EBM) model for SNP detection and allele frequency estimation in pooled sequencing data. RESULTS The proposed model reliably learns the error distribution by pooling information across pools and genomic positions. In addition, the proposed EBM model builds in characteristics unique to the pooled sequencing data, boosting the sensitivity of SNP detection. For large-scale inference in SNP detection, the EBM model provides a flexible and robust way for estimation and control of local false discovery rate. We demonstrate the performance of the proposed method through simulation studies and real data application. AVAILABILITY Implementation of this method is available at https://sites.google.com/site/zhouby98.
منابع مشابه
EMPIRICAL BAYES ANALYSIS OF TWO-FACTOR EXPERIMENTS UNDER INVERSE GAUSSIAN MODEL
A two-factor experiment with interaction between factors wherein observations follow an Inverse Gaussian model is considered. Analysis of the experiment is approached via an empirical Bayes procedure. The conjugate family of prior distributions is considered. Bayes and empirical Bayes estimators are derived. Application of the procedure is illustrated on a data set, which has previously been an...
متن کاملA cross-sample statistical model for SNP detection in short-read sequencing data
Highly multiplex DNA sequencers have greatly expanded our ability to survey human genomes for previously unknown single nucleotide polymorphisms (SNPs). However, sequencing and mapping errors, though rare, contribute substantially to the number of false discoveries in current SNP callers. We demonstrate that we can significantly reduce the number of false positive SNP calls by pooling informati...
متن کاملInvariant Empirical Bayes Confidence Interval for Mean Vector of Normal Distribution and its Generalization for Exponential Family
Based on a given Bayesian model of multivariate normal with known variance matrix we will find an empirical Bayes confidence interval for the mean vector components which have normal distribution. We will find this empirical Bayes confidence interval as a conditional form on ancillary statistic. In both cases (i.e. conditional and unconditional empirical Bayes confidence interval), the empiri...
متن کاملEmpirical Bayes Estimation in Nonstationary Markov chains
Estimation procedures for nonstationary Markov chains appear to be relatively sparse. This work introduces empirical Bayes estimators for the transition probability matrix of a finite nonstationary Markov chain. The data are assumed to be of a panel study type in which each data set consists of a sequence of observations on N>=2 independent and identically dis...
متن کاملUnobserved Heterogeneity in Longitudinal Data An Empirical Bayes Perspective
Abstract. Empirical Bayes methods for Gaussian and binomial compound decision problems involving longitudinal data are considered. A new convex optimization formulation of the nonparametric (Kiefer-Wolfowitz) maximum likelihood estimator for mixture models is used to construct nonparametric Bayes rules for compound decisions. The methods are illustrated with some simulation examples as well as ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 28 20 شماره
صفحات -
تاریخ انتشار 2012